Multi-Speaker Language Modeling
نویسندگان
چکیده
In conventional language modeling, the words from only one speaker are represented at a time, even for conversational tasks such as meetings and telephone calls. In a conversational or meeting setting, however, different speakers can influence each other. In order to recover this missing inter-speaker information, in this work we present a novel approach for conversational language modeling that considers words from other speakers when predicting words from the current one. By adding only one additional word from other speakers into the normal trigram context, our new multi-speaker language model (MSLM) gives a 3.9% perplexity reduction on the Switchboard corpus and a 10.3% perplexity reduction on ICSI Meeting Recorder corpus. This improvement can be further enhanced by the use of class-based multi-speaker language models. We develop two new conditional word clustering algorithms in this framework. With the new algorithms, we achieve a 5.7% perplexity reduction on Switchboard and a 12.2% reduction on the ICSI Meeting Recorder data.
منابع مشابه
Multi-stream language identification using data-driven dependency selection
The most widespread approach to automatic language identification in the past has been the statistical modeling of phone sequences extracted from speech signals. Recently, we have developed an alternative approach to LID based on n-gram modeling of parallel streams of articulatory features, which was shown to have advantages over phone-based systems on short test signals whereas the latter achi...
متن کاملSpeaker and language adaptive training for HMM-based polyglot speech synthesis
This paper proposes a novel technique for speaker and language adaptive training for HMM-based statistical parametric polyglot speech synthesis. Language-specific context-dependencies in the system are captured using CAT with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by CMLLR-based transforms. This framework allows multi-speaker/multi-la...
متن کاملHMM-based polyglot speech synthesis by speaker and language adaptive training
This paper describes a technique for speaker and language adaptive training (SLAT) for HMM-based polyglot speech synthesis and its evaluations on a multi-lingual speech corpus. The SLAT technique allows multi-speaker/multi-language adaptive training and synthesis to be performed. Experimental results show that the SLAT technique achieves better naturalness than both speaker-adaptively trained l...
متن کاملMulti-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis
Building text-to-speech (TTS) systems requires large amounts of high quality speech recordings and annotations, which is a challenge to collect especially considering the variation in spoken languages around the world. Acoustic modeling techniques that could utilize inhomogeneous data are hence important as they allow us to pool more data for training. This paper presents a long short-term memo...
متن کاملUniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages
Acquiring data for text-to-speech (TTS) systems is expensive. This typically requires large amounts of training data, which is not available for low-resourced languages. Sometimes small amounts of data can be collected, while often no data may be available at all. This paper presents an acoustic modeling approach utilizing long short-term memory (LSTM) recurrent neural networks (RNN) aimed at p...
متن کامل